MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
نویسندگان
چکیده
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.
منابع مشابه
A robust statistical framework for reconstructing genomes from metagenomic data
We present software that reconstructs genomes from shotgun metagenomic sequences using a reference-independent approach. This method permits the identification of OTUs in large complex communities where many species are unknown. Binning reduces the complexity of a metagenomic dataset enabling many downstream analyses previously unavailable. In this study we developed MetaBAT, a robust statistic...
متن کاملAccurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new...
متن کاملRecovering Genomics Clusters of Secondary Metabolites from Lakes Using Genome-Resolved Metagenomics
Metagenomic approaches became increasingly popular in the past decades due to decreasing costs of DNA sequencing and bioinformatics development. So far, however, the recovery of long genes coding for secondary metabolites still represents a big challenge. Often, the quality of metagenome assemblies is poor, especially in environments with a high microbial diversity where sequence coverage is lo...
متن کاملSpecies Specific DNA Profiling Mycobacterial Genomes Using Polymerase Chain Reaction with Single Universal Primer (UP-PCR)
Three tuberculous, twenty-one non-tuberculous mycobacterial (NTM) reference strains and seventy two isolates classified by biochemical tests were shown to produce specific sets of DNA fragments in a polymerase chain reaction with single universal primer (UP-PCR). A rather wide limit of tolerance for variations in procedure of PCR mixture preparation and thermocycling parameters was found. There...
متن کاملGenome-reconstruction for eukaryotes from complex natural microbial communities.
Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of ...
متن کامل